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R apid Solution of Large-scale Systems Of Equations 

by Olaf O. Storaasii, (O.O.Storaasli@larc.nasa.gov or 804-864-2927) 
for Workshop on the Role of Computers in Langley R&D (6-15-94) 

The analysis and design of complex aerospace structures requires the rapid 

fo°r U hNrkMnn r l1h^’ ems °!A r ! e ? r and n J onlinear equations, eigenvalue extraction 
"S; vibration and flutter modes, structural optimization and design 
sensitivity calculation. Computers with multiple processors and vector capabilities 
can offer substantial computational advantages over traditional scalar computers 
for these analyses. These computers fall into two categories: shared-memory 

Paragon, TBM SP- 2 ). ray 90) a " d distribu 'ed-memory computers (e.“l 

Shared-memory computers have only a few processors (16 on a Cray C-901 

rri*oc r f P '? y process vector instructions (simultaneous adds and multiplies) and 
address a large memory. Information is shared among processors by referencing 
a common variable in shared-memory. y y 

pistributed-memory computers may have thousands of processors, each with 
imited memory. Explicit message passing commands (i.e. send, receive), are 
used to communicate information between processors. Such communication is 

H^!-K C ? n w Um ' n9 ' so al 9° rithms need to be designed to run efficiently on 
distributed-memory computers. y 

This presentation will cover general-purpose, highly-efficient algorithms for 
general, on/assembly of element matrices, solution of systems of linear and 

and design sensitivity analysis and optimization 
thms are coded in FORTRAN for shared-memory computers, and many 
adapted to distributed-memory computers. The capability and numerical 
performance of these algorithms will be addressed. 

O. Storaasii, D. Nguyen, M. Baddourah and J. Qjn (1993) "Comoutational 

ASXZE/AlTs/AS^^dlh I?° 1S f ° r Parallel_Vector Supercomputers", AIAA/ASME/ 
ASLE/AHS/ASC 34th Structures, Structural Dynamics and Materials Conference 

vS^Na ?% P 1993 ) PP ’ 7?2 " 778 (Int * 1 of Com P u bng Systems in Engineering, 
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D . r '.i? a /,?il! e , r Storaash ls a senior research scientist in computational mechanics 
at the NASA Langley Research Center, Hampton, Virginia. He began his career 
at Langley after receiving a Ph.D. degree in Engineering Mechanics from North 
Carolina State University in 1970. 

Long before parallel computers were commercially available, Dr. Storaasli led a 
hardware, software and applications team at NASA Langley Research Center to 
develop one of the first parallel computers, the Finite Element Machine He has 
authored over 80 works in computational structural mechanics including static 
and dynamic structural analysis, eigenvalue and optimization methods 
interdisciplinary analysis, data management, and parallel-vector structural 
analysis methods on supercomputers. He received the Floyd L.' Thompson 
Fellowship of NASA Langley Research Center for post-doctoral research at 
Norges Tekmske Hogskole in Trondheim, Norway, and Det Norske Veritas Oslo 
m dunng 1984 -85 and has been invited back twice since He received 5 
NASA-wide and 8 Langley Achievement awards for outstanding work in 
Computational Structural Mechanics. These awards included significant 
contributions to the NASA Viking and Integrated Programs for Aerospace-Vehicle 
Design (IPAD) Projects as well as to the development of Relational Information 
Management (RIM), since developed into the commercial relational data-base 
software: R:BASE. In August, 1989, Cray Research selected the general-purpose 
matrix equation solution software, pvsolve, developed by Dr. Storaasli and his 
co leagues, to receive the GigaFLOP Performance Award, pvsolve was used to 
ec l uati ° n s (9.2 billion floating point operations) in the Space 
bnuttle Solid Rocket Booster structural analysis in six seconds elapsed time. His 
recent research has resulted in methods to analyze a 172,400 equation (5,737 
bandwidth) refined model of a high speed civil transport and a 265,000 equation 
automobile (3,374 bandwidth) application in less than two minutes on the Cray C- 
90 and a method to generate and assemble stuctural stiffness matrices on the 
Intel Delta at speeds 25 times that of one Cray C-90 processor. 


131 




Rapid Solution of 
Large Systems of Equations 


JM" 


Dr. Olaf Storaasli 

Computational Structures Branch 
Mail Stop 240 

NASA Langley Research Center 
Hampton, VA 23681 

Email: 0.0.Storaasli@larc.nasa.gov 
Phone: 804-864-2927 
FAX: 804-864-8912 



presented at 

Workshop on 

The Role of Computers in Langley R&D 

June 15, 1994, Reid Conference Center 
NASA Langley Research Center 


Switch 


IM3GG3I23 


Langley 

Research 

Center 


Objective 


9 Foster, cheaper, better a nalysis/design 
of large-scale structures 

’ Develop algorithms to exploit high- 
performance computers 

- Evaluate computational performance 
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Outline 

Supercomputers & Structural Models 
Structural Analysis 

- Nodal Generation and Assembly 
" Linear Equation Solvers 

Shared-memory computers 
fpjf^lpL Distributed-memory computers 

- Nonlinear Equation Solvers 
Structural Optimization 
Design Sensitivity 
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-Vector Speedup 

sequential code) 


16x20=320 

C-90-16^^ 



Vector 

Speedup 
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TypicaMStmc^ 


• Generate mesh 

(nodes and elements) 

• Assemble stiffness [K], 
mass [M], and load {p} 

• Solve: [K] {u} = {p} for displacement, U 

[K] {(p) = A. [M] {(p} for modes, cp 

• Repeat: multiple analyses for nonlinear & design 

• Plot: u, stresses and vibration modes, (p 
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Performance Assessment Applications 
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• Nearly ideal parallel speedup 

• (no interprocessor communication) 


Time 3 o. 
(Sec) ( 


Mach 2A HSCT 




1 8 
Cray C-90 


2.SS4 Hadm 
7,86 B ElerrwntgN 
16,152 Equations 
770 Bandwidth 


16 32 64 128 256 512 

Intel Delta 


Number of Processors 



Equation Solution Issues I [0 

(Time, memory, disk spac e, I/O) I ^ 

• Iterative or direct 7 

• Banded or sparse ? 

• “In-core” or “out-of-core” ? 

Communication gpP 

• Broadcast or ring? 

• OSF or SUNMOS? 


Langley 
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^quation^SolversJ 

• Iterative and O/recf (function of application) 

• Unpack (MP Unpack), LApack 

(needs full matrix for best performance) 

• Banded Indefinite, nonsymmetric 

(requires pivoting) 

• Banded Definite Symmetric 

(seldom occurs in practical structures) 

• Skyline*, Variable-band* 

(DOT-product, SAXPY operations minimize time) 

• Sparse *, Wavefront* (<5% nonzeros) 

* node or equation reordering minimizes solution time Langley 

Research 

Center 


Direct Equation Solvers 
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• iterative slow, convergence not guaranteed 

• Direct complex coding (banded, sparse) 


Mach 2.4 HSCT 


Time 
(sec) 800 




Iterative (PCG) 
Direct (Gauss) 


v ■ ■ ■ ■ i * * » i 

4 8 16 32 
Number of Inter Gamma Processors 



Time 

(sec) 


“Out-of-core” Direct Solver I 

- using Cray Solid State Disk - | 

• as fast as “in-core^^oive^ - 

1 5.6 

B * memory usedi i.ix bandwidth 2 

(or 24 x bandwidth + 6 x neq) 


Mach 2.4 HSCT 
16,152 Equations 
3.5 billion operations 




in-core’ 


y in-cur 


i 7 Gil op 


Number of Cray Processors (Y-MP, C-90) 
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Industrial-Strength Equation | 



Sg lver Application 

PVSOLVE Symmetric ♦ Der 

PVSOLVE-OOC 

PVSOLVE-OOC+ 

VSS (Vector Sparse) 
PCG(lteratlve) “ 

LANZ(Eigensolver) 


M e m°ry Parallel(shared ) Parallel(Distributed) 


equations X Bandwidth Yes (Cray C-90, etc) 
1.1 x Bandwidth 3 Yes (Cray C-90, etc) 

24 x Bandwidth No (Cray C-90, etc) 

function ol sparsity No (Cray C-90, etc) 

- matrix nonzeros Yes (Cray C-90, etc) 
equations X bandwidth Yes (Cray C-90, etc) 


Yes (Intel Paragon, IBM SP-1) 
Not yet 
Not yet 
Not yet 

Yes (Intel Paragon, IBM SP-1) 


Yes (Intel Paragon) 


MQIE: These solvers have been evaluated on real applications with up to 263.574 actuations 
and larger matrices with S£Y.eral million equations. PCG is slowest, VSS is fastest 
(for large, sparse problems) and PVSOLVE-OOC is the best all-around parallel- 
vector solver. PVSOLVE-OOC exploits Cray solid-state disk 



* special versions of PVSOLVE for unsymmetrlc and negative coefficient matrices 
solve panel flutter, CFD, nonlinear and optimization problems 
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Parallel-Vector Equation Solver 1 
(PVSOLVE) I 


Shared Memory 
L ~" 

Cray GigaFLOP award 



• “in-core skyline and variable-band versions 



• “out-of-core” versions: memory ~ 1.2 bandwidth* 

and 24 x bandwidth 

• tuned for Crays (or shared memory computer/workstation) 

Distributed Memory 

in-core Skyline - Intel i/860 or Paragon 


• in-COre row version - Intel 86O or Paragon, IBM SP-1 



Conversion underway to TMC CM- 5 , Convex SSP -1 and Cray T- 3 D 


COMET, Ford, U. Virginia, IBM, Princeton, LLNL, NSF sites Langley 
Convex, COMCO, NASA Lewis + several dozen sites Research 

Center 
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Parallel Geometric 
Nonlinear Methods 

• Newton-Raphson fastest on parallel-vector computers 

CK ♦ Kg] {Uj = {f} 


Time 

(sec) 



modified Newton-Raphson (301 M) 
^ ^BFGS(354M) 
^x^ Newton * R aphson (438M) 


Number of Cray Y-MP Processors 
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Optinri izat i on Procedu re 


• Find aircraft minimum weight subject to 
displacement and stress constraints 

• Nonlinear constrained optimization finds: 


* Direction: BFGS, Simplex-Linear 
Programming 


• Step size: Golden Block 
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Minimize F(x 1t x 2 , x n ) is equivalent to F, (x t , x 2 , x n ) = 0 

^”2(^1* ^2» •••! ^n) “ 0 

For 11,000 nonlinear equations: • : 

•••> — 0 

or 

Min (F^ + F 2 2 + ... F„ 2 ) 




1 2 4 8 16 

Number of Cray C-90 Processors 
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7,868 Triangular Elements 

1 Design Variable (skin thickness) 

I □ Hand coded a ADIFOR s Finite Difference 




16 64 64 128 

Number of Intel Delta Processors 
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Research 
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Concl^ 

New algorithms for high-perfofifiance computers 
Perform well on large-scale applications: 



Nodal Matrix Generation and Assembly 

- Equation Solvers: [K]{u} = {p} 

(linear, nonlinear, “out-of core”, sparse) 

• Structural Optimization 


- Design Sensitivity 


Operate on Cray, Paragon, IBM SP-1 and SP-2! 



Langley 

Research 

Center 
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• Storaasli, 0., Nguyen, D., Baddourah, M. and Qin, J.; 
Computational Mechanics Analysis Tools for Parallel- 
Vector S u perc om p ute rs "yAlAA/A SME/A SCE/A HS/A SC 
34th Structures , Structural Dynamics and Materials 
Conference Proceedings , Part 2, pp. 772-778, April 1993. 


also International Journal of Computing Systems in 
Engineering, Vol. 4, No. 2-4, 1993 pp. 349-354 

• on MOSAIC-WWW (Langley Technical Report Server) 

• Questions: 0.0.Storaasli@larc.nasa.gov 

• Free Videotape from: shuguez@nas.nasa.gov 

(Santa Huguez at 415-604-4632) 



Langley 

Research 

Center 


146 



